overlap score
- North America > United States (0.29)
- North America > Canada (0.04)
- Information Technology (0.68)
- Government (0.47)
- Semiconductors & Electronics (0.46)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use
Khan, Zaid, Farhadi, Ali, Krishna, Ranjay, Weihs, Luca, Bansal, Mohit, Gupta, Tanmay
When a human requests an LLM to complete a coding task using functionality from a large code repository, how do we provide context from the repo to the LLM? One approach is to add the entire repo to the LLM's context window. However, most tasks involve only fraction of symbols from a repo, longer contexts are detrimental to the LLM's reasoning abilities, and context windows are not unlimited. Alternatively, we could emulate the human ability to navigate a large repo, pick out the right functionality, and form a plan to solve the task. We propose MutaGReP (Mutation-guided Grounded Repository Plan Search), an approach to search for plans that decompose a user request into natural language steps grounded in the codebase. MutaGReP performs neural tree search in plan space, exploring by mutating plans and using a symbol retriever for grounding. On the challenging LongCodeArena benchmark, our plans use less than 5% of the 128K context window for GPT-4o but rival the coding performance of GPT-4o with a context window filled with the repo. Plans produced by MutaGReP allow Qwen 2.5 Coder 32B and 72B to match the performance of GPT-4o with full repo context and enable progress on the hardest LongCodeArena tasks. Project page: zaidkhan.me/MutaGReP
- North America > United States > Texas > Bee County (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Workflow (0.67)
- Research Report > New Finding (0.46)
Can AI mimic the human ability to define neologisms?
An ongoing and intriguing debate focuses on whether Large Language Models (LLMs) can replicate human language. The literature presents mixed evidence on this matter. Several studies suggest that LLMs can generate text closely resembling human language (Bubeck et al., 2023; Clark et al., 2021; Georgiou, 2025). However, the widely accept ed concept of a universal grammar inherent in humans (Chomsky, 2000) challenges the idea that machine cognition can mirror human cognition. According to Chomsky et al. (2023), models like ChatGPT function as statistical engines driven by pattern recognitio n. Supporting this perspective, other studies highlight significant differences between human cognition and LLMs, which are reflected in language (Cai et al., 2024; Georgiou, 2024; Herbold et al., 2023). For instance, Georgiou (2024) examined how various linguistic components are represented in human - written and AI - generated texts, assessing the ability of ChatGPT to emulate human writing. The author found that d espite AI - generated texts appear ing to mimic human language, the results revealed signifi cant differences across multiple linguistic features in the domains of phonology, grammar, and semantics.
- North America > Canada > Alberta (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > Greece > Central Macedonia > Thessaloniki (0.05)
- (2 more...)
Mining Action Rules for Defect Reduction Planning
Oueslati, Khouloud, Laberge, Gabriel, Lamothe, Maxime, Khomh, Foutse
Defect reduction planning plays a vital role in enhancing software quality and minimizing software maintenance costs. By training a black box machine learning model and "explaining" its predictions, explainable AI for software engineering aims to identify the code characteristics that impact maintenance risks. However, post-hoc explanations do not always faithfully reflect what the original model computes. In this paper, we introduce CounterACT, a Counterfactual ACTion rule mining approach that can generate defect reduction plans without black-box models. By leveraging action rules, CounterACT provides a course of action that can be considered as a counterfactual explanation for the class (e.g., buggy or not buggy) assigned to a piece of code. We compare the effectiveness of CounterACT with the original action rule mining algorithm and six established defect reduction approaches on 9 software projects. Our evaluation is based on (a) overlap scores between proposed code changes and actual developer modifications; (b) improvement scores in future releases; and (c) the precision, recall, and F1-score of the plans. Our results show that, compared to competing approaches, CounterACT's explainable plans achieve higher overlap scores at the release level (median 95%) and commit level (median 85.97%), and they offer better trade-off between precision and recall (median F1-score 88.12%). Finally, we venture beyond planning and explore leveraging Large Language models (LLM) for generating code edits from our generated plans. Our results show that suggested LLM code edits supported by our plans are actionable and are more likely to pass relevant test cases than vanilla LLM code recommendations.
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
Binomial Tails for Community Analysis
Madani, Omid, Ngo, Thanh, Zeng, Weifei, Averine, Sai Ankith, Evuru, Sasidhar, Malhotra, Varun, Gandham, Shashidhar, Yadav, Navindra
Automated discovery of candidate communities in networks finds a variety of applications in physical and social sciences (biological and biochemical networks, physical and virtual human networks) [1, 2]. Given a graph representing binary relations among nodes, informally and intuitively, a community corresponds to a subgraph, i.e. a subset of nodes, with relatively high edge density among the community members (nodes of the subgraph), and comparatively lower density of edges going outside the community. Defining communities more precisely and what overall community structure may be in various domains, and design of efficient robust algorithms for uncovering such in networks has been the subject of much research [1, 3]. In our use-case, we are interested in the automated discovery and effective presentation of candidate communities comprised of computers (hosts) in an enterprise network. In particular this effort is a component of a tool that provides a user, such as a security administrator of an organization, visibility into their complex network, and importantly helps the user partition the network into groups corresponding to geographic partitions, different departments, and hosts running different applications in the organization. This partitioning and naming of the groups is a necessary step in defining and maintaining network security policies, aka network segmentation: hosts in different groups (segments) can only communicate on a few well-defined and restricted channels. Such policy enforcement severely limits penetration and spread of malware and hackers. This step of grouping hosts and assigning meaningful names/labels to the groups, with the human in the loop, is also highly useful in generating insights, for example in uncovering broad patterns of communications with applications not just for security but also for network optimization.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > India (0.04)
- Research Report > New Finding (0.92)
- Research Report > Experimental Study (0.68)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining (0.93)
SemEval-2013 Task 4: Free Paraphrases of Noun Compounds
Hendrickx, Iris, Nakov, Preslav, Szpakowicz, Stan, Kozareva, Zornitsa, Séaghdha, Diarmuid Ó, Veale, Tony
In this paper, we describe SemEval-2013 Task 4: the definition, the data, the evaluation and the results. The task is to capture some of the meaning of English noun compounds via paraphrasing. Given a two-word noun compound, the participating system is asked to produce an explicitly ranked list of its free-form paraphrases. The list is automatically compared and evaluated against a similarly ranked list of paraphrases proposed by human annotators, recruited and managed through Amazon's Mechanical Turk. The comparison of raw paraphrases is sensitive to syntactic and morphological variation. The "gold" ranking is based on the relative popularity of paraphrases among annotators. To make the ranking more reliable, highly similar paraphrases are grouped, so as to downplay superficial differences in syntax and morphology. Three systems participated in the task. They all beat a simple baseline on one of the two evaluation measures, but not on both measures. This shows that the task is difficult.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.29)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
- Europe > Sweden > Uppsala County > Uppsala (0.04)
- (14 more...)